cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Bill
by New Contributor III
  • 3522 Views
  • 5 replies
  • 2 kudos

Resolved! How to access tables created in 2017

In 2017 while working on my Masters degree, I created some tables that I would like to access again. Back then I could just write SQL and find them but today that doesn't work. I suspect it has something to do with Delta Lake. What do I have to do to...

  • 3522 Views
  • 5 replies
  • 2 kudos
Latest Reply
Bill
New Contributor III
  • 2 kudos

That did it. Thanks

  • 2 kudos
4 More Replies
Anonymous
by Not applicable
  • 2005 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to start cluster on E2 Workspace

Hello Community,I'm trying to create and start my first cluster on my E2 Databricks Workspace on AWS; however, the cluster is created but after STARTING the cluster immediately the cluster status goes to TERMINATING. Logs provided by Databricks show ...

  • 2005 Views
  • 1 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Update:It was an error on my side with the KMS key.

  • 1 kudos
Taha_Hussain
by Databricks Employee
  • 1640 Views
  • 1 replies
  • 6 kudos

Databricks Office Hours Our next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT. Do you have questions about how to set up or...

Databricks Office HoursOur next Office Hours session is scheduled for May 18th from 8:00 am - 9:00am PT.Do you have questions about how to set up or use Databricks? Do you want to learn more about the best practices for deploying your use case or tip...

  • 1640 Views
  • 1 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 6 kudos

Just registered!

  • 6 kudos
Hubert-Dudek
by Databricks MVP
  • 1869 Views
  • 0 replies
  • 20 kudos

From Databricks runtime 10.5 you can get metadata using the hidden _metadata column. Currently, the column contains input files information (file_path...

From Databricks runtime 10.5 you can get metadata using the hidden _metadata column. Currently, the column contains input files information (file_path, file_name, file_size and file_modification_time)

firefox_2022-05-06_17-26-52
  • 1869 Views
  • 0 replies
  • 20 kudos
Ashley1
by Contributor
  • 4260 Views
  • 5 replies
  • 1 kudos

Resolved! Can ADLS be mounted in DBFS using only ADLS account key?

I realise this is not an optimal configuration but I'm trying to pull together a POC and I'm not at the point that I wish to ask the AAD admins to create an application for OAuth authentication.I have been able to use direct references to the ADLS co...

  • 4260 Views
  • 5 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hey there @Ashley Betts​ Thank you for posting your question. And you found the solution.This is awesome!Would you be happy to mark the answer as best so that other members can find the solution more quickly?Cheers!

  • 1 kudos
4 More Replies
Lincoln_Bergeso
by New Contributor II
  • 10837 Views
  • 8 replies
  • 4 kudos

Resolved! How do I read the contents of a hidden file in a Spark job?

I'm trying to read a file from a Google Cloud Storage bucket. The filename starts with a period, so Spark assumes the file is hidden and won't let me read it.My code is similar to this:from pyspark.sql import SparkSession   spark = SparkSession.build...

  • 10837 Views
  • 8 replies
  • 4 kudos
Latest Reply
Dan_Z
Databricks Employee
  • 4 kudos

I don't think there is an easy way to do this. You will also break very basic functionality (like being able to read Delta tables) if you were able to get around these constraints. I suggest you employ a rename job and then read.

  • 4 kudos
7 More Replies
patojo94
by New Contributor II
  • 8491 Views
  • 0 replies
  • 0 kudos

Adding deduplication method to spark streaming

Hi everyone, I am having some troubles to add a deduplication step on a file streaming that is already running. The code I am trying to add is this one:df = df.withWatermark("arrival_time", "20 minutes")\ .dropDuplicates(["event_id", "arrival_time"])...

  • 8491 Views
  • 0 replies
  • 0 kudos
Jin_Kim
by New Contributor II
  • 7639 Views
  • 2 replies
  • 4 kudos

Resolved! address how to use multiple spark streaming jobs connecting to one job cluster

Hi,We have a scenario where we need to deploy 15 spark streaming applications on databricks reading from kafka to single Job cluster. We tried following approach:1. create job 1 with new job cluster (C1)2. create job2 pointing to C1...3. create job15...

  • 7639 Views
  • 2 replies
  • 4 kudos
Latest Reply
Jin_Kim
New Contributor II
  • 4 kudos

@Hubert Dudek​ , thanks a lot for responding. When we have setup like this, if one tasks fails, it will not terminate the entire job right?Since, the job is continously running as it is streaming app, is it possible to add new task to the job(while i...

  • 4 kudos
1 More Replies
Carneiro
by New Contributor II
  • 8603 Views
  • 2 replies
  • 2 kudos

Resolved! Stuck in "Running Command ..."

Hi, Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands as:dataframe.toPandas() dataframe.show(n=1) dataframe.description() dataframe.write.format("csv").save(location) ge...

  • 8603 Views
  • 2 replies
  • 2 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 2 kudos

Hi @Luiz Carneiro​ ,Could you split your Spark's actions into more CMD (paragraphs) and run one at a time to check where it could be taking the extra time. Also, Pandas only runs on your driver. Have you try to use Python or Scala APIs instead? in ca...

  • 2 kudos
1 More Replies
RengarLee
by Contributor
  • 5620 Views
  • 5 replies
  • 0 kudos

Resolved! How to improve Spark Streaming writer Input Rate and Processing rate?

Hi!I have many questions about Spark Streaming and Evnethub。Can you help me?Q1:How to improve Spark Streaming writer Input Rate and Processing rate?I connect Azure Eventhubs using Spark Streaming(Azure Databricks), but I found if I use display, this ...

  • 5620 Views
  • 5 replies
  • 0 kudos
Latest Reply
RengarLee
Contributor
  • 0 kudos

setMaxEventsPerTrigger not equal to numInputRow is my problem

  • 0 kudos
4 More Replies
shan_chandra
by Databricks Employee
  • 4808 Views
  • 1 replies
  • 3 kudos

Resolved! How to execute matplotlib animations in a Databricks notebook?

How to execute matplotlib animations in a Databricks notebook?

  • 4808 Views
  • 1 replies
  • 3 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 3 kudos

Please refer to the below example code and , use displayHTML(ani.to_jshtml()) to execute matplotlib animations in a databricks notebookimport matplotlib.pyplot as plt import matplotlib.animation import numpy as np t = np.linspace(0,2*np.pi) x = np.si...

  • 3 kudos
Orianh
by Valued Contributor II
  • 3745 Views
  • 2 replies
  • 2 kudos

Resolved! pyodbc read only connection.

Hey Guys, Is there a way to open pyodbc read only connection with simba spark driver? At the moment, I'm able to execute queries such as select , delete, insert into - basically every sql statement using pyodbc. I tried to open pyodbc connection but ...

  • 3745 Views
  • 2 replies
  • 2 kudos
Latest Reply
Hubert-Dudek
Databricks MVP
  • 2 kudos

This readonly=True is working only on some drivers. Just create additional users with granted read-only permission.

  • 2 kudos
1 More Replies
Labels