cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

drewster
by New Contributor III
  • 12365 Views
  • 13 replies
  • 13 kudos

Resolved! Spark streaming autoloader slow second batch - checkpoint issues?

I am running a massive history of about 250gb ~6mil phone call transcriptions (json read in as raw text) from a raw -> bronze pipeline in Azure Databricks using pyspark. The source is mounted storage and is continuously having files added and we do n...

  • 12365 Views
  • 13 replies
  • 13 kudos
Latest Reply
Brooksjit
New Contributor III
  • 13 kudos

Thank you for the explanation.

  • 13 kudos
12 More Replies
sarveshpandey23
by New Contributor II
  • 2539 Views
  • 4 replies
  • 3 kudos

There is some issue in Column

I have loaded the csv file from my local PC to data bricks. Now, when i am trying to query the table over column its throwing error. Note - select * from table_name is working but select column_name from table_name is throwing error.

image
  • 2539 Views
  • 4 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

Thanks for the information, I will try to figure it out for more. Keep sharing such informative post keep suggesting such post.

  • 3 kudos
3 More Replies
Scorpius
by New Contributor II
  • 1740 Views
  • 1 replies
  • 0 kudos

Unregistering / Removing A SparkListener

Hey, So currently I'm using Pyspark and utilizing SparkListener to track metrics of my spark jobs. The problem I can't seem to solve is why in databricks the listener cannot be removed with removeSparkListener as it still is attached to the context.#...

  • 1740 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hey, So currently I'm using Pyspark and utilizing SparkListener to track metrics of my spark jobs. The problem I can't seem to solve is why in databricks the listener cannot be removed with removeSparkListener as it still is attached to the context.#...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
Raymond_Garcia
by Contributor II
  • 2326 Views
  • 4 replies
  • 2 kudos

Migrating from Databricks Notebooks to IDE for Development

Hello, we are developers who have been creating a system in Databricks with Scala. We enabled the Git feature, so the project is in a repository. The project has a lot of notebooks and a lot of calls to other notebooks. Sometimes it is a little overw...

  • 2326 Views
  • 4 replies
  • 2 kudos
Latest Reply
Raymond_Garcia
Contributor II
  • 2 kudos

it is true that we can't work without data bricks but we can develop an IDE and send the jar to databricks, this will allow us to create unit tests, and use the IDE capabilities (i.e fast navigation among classes).

  • 2 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 4720 Views
  • 3 replies
  • 2 kudos

Resolved! is it possible to have multiple tabs in Dashboard? if not is there any workaround for this.

is it possible to have multiple tabs in Dashboard? if not is there any workaround for this.

  • 4720 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey @Janga Reddy​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 2 kudos
2 More Replies
griffinw
by New Contributor III
  • 1491 Views
  • 2 replies
  • 1 kudos

Plotly in python notebook- not seeing all graphs

Hello,I have written code that produces plotly charts in a python notebook (by facet) - each output is 4 line charts side-by-side with a few points highlighted.When I run this in a loop to produce more than 7 or 8 of my graphs, some of the features ...

  • 1491 Views
  • 2 replies
  • 1 kudos
Latest Reply
griffinw
New Contributor III
  • 1 kudos

unfortunately I'm not able to share as it is in reference to proprietary company dataessentially, I can produce any one plotly graph like:[line graph with scatter]but if i produce, say, 8 of them (looping on a list of inputs), I get something like th...

  • 1 kudos
1 More Replies
irfanijaz
by New Contributor
  • 707 Views
  • 0 replies
  • 0 kudos

Differently named storage accounts in different environments

Hi,I have a solution design question on which I am looking for some help. We have 2 environments in azure (dev and prod), each env has its own ADLS storage account with a different name of course. Within Databricks code we are NOT leveraging the mou...

  • 707 Views
  • 0 replies
  • 0 kudos
Maverick1
by Valued Contributor II
  • 5763 Views
  • 2 replies
  • 8 kudos

Resolved! How to get the list of all jobs available for a particular user?

As of now, if I try to list the jobs via "list job" API then there is a limit of 25 jobs only.Is there a way to list all the available/visible jobs to a user?

  • 5763 Views
  • 2 replies
  • 8 kudos
Latest Reply
User16764241763
Honored Contributor
  • 8 kudos

Hello @Saurabh Verma​ Can the user generate the API token in the workspace and try to use the API?

  • 8 kudos
1 More Replies
korilium
by New Contributor III
  • 10452 Views
  • 9 replies
  • 3 kudos

Databricks-connect invalid shard address

I want to use databricks inside vscode and I therefore need Databricks-connect I configure my settings using databricks-connect configure as follows: Databricks Host [https://adb-1409757184094616.16.azuredatabricks.net]Databricks Token [<my token>]Cl...

  • 10452 Views
  • 9 replies
  • 3 kudos
Latest Reply
Justin09
New Contributor II
  • 3 kudos

In case it helps anyone, I ran into this issue and had to remove the trailing / from the host name. It used to work fine with the trailing / so something must have changed.

  • 3 kudos
8 More Replies
alpha
by New Contributor III
  • 7985 Views
  • 2 replies
  • 5 kudos

Resolved! Connecting to DataLake Gen2 from Azure Databricks with Private Endpoint

Hi,I have datalake gen2 with vnet and private endpoint. I do have databricks workspace in same vnet. I am trying to access the datalake from databricks but I keep getting error when I allow access only for selected network in datalake. I get error w...

  • 7985 Views
  • 2 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

You can always check the tutorial regarding ADLS and private link under this link https://community.databricks.com/s/feed/0D53f00001eQGOHCA4

  • 5 kudos
1 More Replies
RiyazAli
by Valued Contributor
  • 2541 Views
  • 1 replies
  • 3 kudos

Errors in notebooks of Scalable Machine Learning with Apache Spark course in Databricks academy.

HI there,I'm following the course mentioned from Databricks Academy. I downloaded the .dbc archiive and working along side the videos from academy. In ML-08 - Hyperopt notebook, I see the following error in cmd 13. best_hyperparam = fmin(fn=objectiv...

hyperopt_implementation hyperopt problem with &quot;max_features&quot;
  • 2541 Views
  • 1 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor
  • 3 kudos

Tagging @Kaniz Fatma​ as there was no response what so ever!By any chance, do you know how to resolve these errors in the notebook?Thanks!

  • 3 kudos
hari
by Contributor
  • 12440 Views
  • 9 replies
  • 2 kudos

Resolved! Max job concurrency per workspace

Per the documentation, a workspace is limited to 1k concurrent job runs.Can somebody clarify how the concurrency limit is set i.e:Is it 1k concurrent runs across all jobs in the workspaceIs it 1k concurrent runs for a single jobAlso, is there any way...

  • 12440 Views
  • 9 replies
  • 2 kudos
Latest Reply
hari
Contributor
  • 2 kudos

Hi @Kaniz Fatma​ Weners gave some great suggestions on the issue we are dealing with. But some confirmation on the question from data bricks side would be much appreciated

  • 2 kudos
8 More Replies
Bittu6084
by New Contributor II
  • 9218 Views
  • 3 replies
  • 5 kudos

Resolved! How can we alter table with auto increment column for a delta table

How can we alter table with auto increment column for a delta tableI have tried this but not working:ALTER TABLE dbgtpTest.student ADD COLUMN Student_Id identity(100,1)any Suggestions will be helpful

  • 9218 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@Rakesh Reddy Badam​ , For ALTER ... ADD COLUMN in doc is only SYNC IDENTITY.If you want to add an identity column to the existing table just create a new table with an identity column and then copy the data.

  • 5 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels