cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

WillJMSFT
by New Contributor III
  • 2603 Views
  • 6 replies
  • 7 kudos

Resolved! How to import SqlDWRelation from com.databricks.spark.sqldw

Hello, All - I'm working on a project using the SQL DataWarehouse connector built into Databricks (https://docs.databricks.com/data/data-sources/azure/synapse-analytics.html). From there, I'm trying to extract information from the logical plan / logi...

  • 2603 Views
  • 6 replies
  • 7 kudos
Latest Reply
WillJMSFT
New Contributor III
  • 7 kudos

@Werner Stinckens​  Thanks for the reply! The SQL DW Connector itself is working just fine and I can retrieve the results from the SQL DW. I'm trying to extract the metadata (i.e. the Server, Database, and Table name) from the logical plan (or throu...

  • 7 kudos
5 More Replies
Dileep_Vidyadar
by New Contributor III
  • 3035 Views
  • 7 replies
  • 5 kudos

Not Able to create Cluster on Community Edition for 3-4 days.

I am learning Pyspark on Community edition for a like month. It's been great until I am facing issues while creating a cluster for 3-4 Days.Sometimes it is taking 30 minutes to 60 minutes to create a Cluster and sometimes it is not even creating a Cl...

  • 3035 Views
  • 7 replies
  • 5 kudos
Latest Reply
Anonymous
Not applicable
  • 5 kudos

@Dileep Vidyadara​  - If your question was fully answered by @Hubert Dudek​, would you be happy to mark his answer as best?

  • 5 kudos
6 More Replies
All_Users
by New Contributor II
  • 1108 Views
  • 0 replies
  • 1 kudos

How do you upload a folder of csv files from your local machine into the Databricks platform?

I am working with time-series data, where each day is a separate csv file. I have tried to load a zip file to FileStore but then cannot use the magic command to unzip, most likely because it is in the tmp folder. Is there a workaround for this proble...

  • 1108 Views
  • 0 replies
  • 1 kudos
BrendanTierney
by New Contributor II
  • 2795 Views
  • 6 replies
  • 3 kudos

Resolved! Community Edition is not allocating Cluster

I've been trying to use the Community edition for the past 3 days without success.I go to run a Notebook and it begins to allocated the Cluster, but it it never finishes. Sometimes it times out after 15 minutes.Waiting for cluster to start: Finding i...

ezgif.com-gif-maker
  • 2795 Views
  • 6 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Dileep Vidyadara​ - There did seem to be a problem during the time you posted. For future reference, when you're having trouble, you can check what's going on by going to the AWS Databricks Status Page.Let us know if you have any other questions.

  • 3 kudos
5 More Replies
SepidehEb
by Contributor III
  • 2987 Views
  • 6 replies
  • 7 kudos

Resolved! How to get a minor DBR image?

In short, we aim to add a step to a CI job that would run tests in a container, which supposedly should mimic DBR of our clusters – currently we use 7.3 . We consider using one of databricksruntime images (possibly a standard:7.x for now, https://hub...

  • 2987 Views
  • 6 replies
  • 7 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 7 kudos

Hi @Sepideh Ebrahimi​ , since cluster is Databricks proprietary, you ca not run it locally. as @Werner Stinckens​  said, you can build your own image but that has to be run in cluster. but there is databricks connect (https://docs.databricks.com/dev-...

  • 7 kudos
5 More Replies
sunil_smile
by Contributor
  • 4710 Views
  • 5 replies
  • 6 kudos

Apart from notebook , is it possible to deploy an application (Pyspark , or R+spark) as a package or file and execute them in Databricks ?

Hi,With the help of Databricks-connect i was able to connect the cluster to my local IDE like Pycharm and Rstudio desktop version and able to develop the application and committed the code in Git.When i try to add that repo to the Databricks workspac...

image image
  • 4710 Views
  • 5 replies
  • 6 kudos
Latest Reply
Atanu
Esteemed Contributor
  • 6 kudos

may be you will be interested our db connect . not sure if that resolve your issue to connect with 3rd party tool and setup ur supported IDE notebook serverhttps://docs.databricks.com/dev-tools/databricks-connect.html

  • 6 kudos
4 More Replies
Abela
by New Contributor III
  • 5711 Views
  • 3 replies
  • 7 kudos

Resolved! Databricks drop and remove s3 storage files safely

After dropping a delta table using DROP command in databricks, is there a way to drop the s3 files in databricks without using rm command? Looking for a solution where junior developers can safely drop a table wihout messing with the rm command where...

  • 5711 Views
  • 3 replies
  • 7 kudos
Latest Reply
jose_gonzalez
Moderator
  • 7 kudos

Hi @Alina Bella​ ,Like @Hubert Dudek​ mentioned, we have a best practice guide for dropping managed tables. You can find the docs here

  • 7 kudos
2 More Replies
itay
by New Contributor II
  • 1457 Views
  • 2 replies
  • 1 kudos

Streaming with runOnce and groupBy window queries

I have a streaming job running a groupBy query with a Window of 3 days. The query is searching for different types of events.The stream is configured with runOnce and there is a job scheduled for every hour.Now, I'm not sure what data is processed ea...

  • 1457 Views
  • 2 replies
  • 1 kudos
Latest Reply
jose_gonzalez
Moderator
  • 1 kudos

Hi @itay k​ ,You will need to take a look at the Progress Reporter. This will show the Micro-batch JSON metrics. For example, the metric called "numInputRows" which will display the number of input rows that it processed for the micro-batch. You will...

  • 1 kudos
1 More Replies
kmartin62
by New Contributor III
  • 3781 Views
  • 9 replies
  • 4 kudos

Resolved! Configure Databricks (spark) context from PyCharm

Hello. I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there. However, the cluster I'm trying to access has to give me permission. In this case, I'd go to my cluster, run the cell which gives me permiss...

  • 3781 Views
  • 9 replies
  • 4 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 4 kudos

"I'm trying to connect to Databricks from my IDE (PyCharm) and then run delta table queries from there."If you are going to deploy later your code to databricks the only solutions which I see is to use databricks-connect or just make development envi...

  • 4 kudos
8 More Replies
prasadvaze
by Valued Contributor II
  • 14565 Views
  • 8 replies
  • 3 kudos

Resolved! How to make delta table column values case-insensitive?

 we have many delta tables with string columns as unique key (PK in traditional relational db) and we don't want to insert new row because key value only differs in case. Its lot of code change to use upper/lower function on column value compare (in ...

  • 14565 Views
  • 8 replies
  • 3 kudos
Latest Reply
lizou
Contributor II
  • 3 kudos

Well, the unintended benefit is now I am using int\big int as surrogate keysfor all tables (preferred in DW). All joins are made on integer data types. Query efficiency is also improved.The string matching using upper() is done only on ETL when com...

  • 3 kudos
7 More Replies
Anonymous
by Not applicable
  • 1175 Views
  • 1 replies
  • 1 kudos

Resolved! Access to Cluster Logs for non-admins

Suppose I have a DevOps team that needs near real-time access to cluster logs to troubleshoot job failures. What is the best way for me to grant access to view logs without granting them admin access?

  • 1175 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Please use logging option and set destination for sending logs in cluster settings to other Azure Blob or S3 storage (need to be mounted first):

  • 1 kudos
User16857281869
by New Contributor II
  • 1650 Views
  • 1 replies
  • 1 kudos

Resolved! Why do I see a cost explosion in my blob storage account (DBFS storage, blob storage, ...) for my structures streaming job?

Its usually one or more of the following reasons:1) If you are streaming into a table, you should be using .Trigger option to specify the frequency of checkpointing. Otherwise, the job will call the storage API every 10ms to log the transaction data...

  • 1650 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

please mount cheaper storage (LRS) to custom mount and set there checkpoints,please clear data regularly,if you are using forEac/forEatchBatchh in stream it will save every dataframe on dbfs,please remember not to use display() in production,if on th...

  • 1 kudos
User16857281869
by New Contributor II
  • 1248 Views
  • 1 replies
  • 1 kudos

Resolved! What is the best way to do time series analysis and forecasting with Spark?

We have developed a library on spark which makes typical operations on time series much simpler. You can check the repo in Github for more info. You could also check out one of our blogs which demos an implementation of a forecasting usecase with S...

  • 1248 Views
  • 1 replies
  • 1 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 1 kudos

Currently on databricks there is MLFlow with forecasting option - please check it.

  • 1 kudos
brickster_2018
by Esteemed Contributor
  • 939 Views
  • 1 replies
  • 0 kudos
  • 939 Views
  • 1 replies
  • 0 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 0 kudos

This is a lit of configuration keys to enable or alter the blacklist mechanism:spark.blacklist.enabled – set to Truespark.blacklist.task.maxTaskAttemptsPerExecutor (1 by default)spark.blacklist.task.maxTaskAttemptsPerNode (2 by default)spark.blacklis...

  • 0 kudos
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels