cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ashley1
by Contributor
  • 4056 Views
  • 2 replies
  • 0 kudos

Resolved! JDBC Connectivity via workspace url when No Public IP selected.

Hi All, I think I might be missing something in regard to No Pubic IP Clusters. I have set this option on a workspace (Azure) and setup the appropriate subnets. To my surprise, when I went to setup a JDBC connection to the cluster the JDBC connec...

  • 4056 Views
  • 2 replies
  • 0 kudos
Latest Reply
Anonymous
Not applicable
  • 0 kudos

Hey there @Ashley Betts​ Hope you are well. Just wanted to see if you were able to find an answer to your question and would you like to mark an answer as best? It would be really helpful for the other members too.Cheers!

  • 0 kudos
1 More Replies
Kash
by Contributor III
  • 21320 Views
  • 18 replies
  • 13 kudos

Resolved! HELP! Converting GZ JSON to Delta causes massive CPU spikes and ETL's take days!

Hi there,I was wondering if I could get your advise.We would like to create a bronze delta table using GZ JSON data stored in S3 but each time we attempt to read and write it our clusters CPU spikes to 100%. We are not doing any transformations but s...

  • 21320 Views
  • 18 replies
  • 13 kudos
Latest Reply
Kash
Contributor III
  • 13 kudos

Hi Kaniz,Thanks for the note and thank you everyone for the suggestions and help. @Joseph Kambourakis​ I aded your suggestion to our load but I did not see any change in how our data loads or the time it takes to load data. I've done some additional ...

  • 13 kudos
17 More Replies
amichel
by New Contributor III
  • 9907 Views
  • 3 replies
  • 4 kudos

Resolved! Recommended way to integrate MongoDB as a streaming source

Current state:Data is stored in MongoDB Atlas which is used extensively by all servicesData lake is hosted in same AWS region and connected to MongoDB over private link Requirements:Streaming pipelines that continuously ingest, transform/analyze and ...

  • 9907 Views
  • 3 replies
  • 4 kudos
Latest Reply
robwma
New Contributor III
  • 4 kudos

Another option if you'd like to use Spark as the ingestion is to use the new Spark Connector V10.0 which support Spark Structured Streaming. https://www.mongodb.com/developer/languages/python/streaming-data-apache-spark-mongodb/. If you use Kafka, th...

  • 4 kudos
2 More Replies
User16826994223
by Honored Contributor III
  • 2697 Views
  • 2 replies
  • 0 kudos

Resolved! Error during setup of cluster

Unexpected Launch Failure: An unexpected error was encountered while setting up the cluster. Please retry and contact Azure Databricks if the problem persists. Internal error message: Timeout while placing node.

  • 2697 Views
  • 2 replies
  • 0 kudos
Latest Reply
Will1
New Contributor III
  • 0 kudos

Ensure that CNO$ account has Full Control on the CNO and The computers container;Add CNO$ account (CNO computer object) in Local Admins group;Finally, add CNO$ in Domain Admins group.Regards,Willjoe

  • 0 kudos
1 More Replies
data_testing1
by New Contributor III
  • 91965 Views
  • 6 replies
  • 13 kudos

Can databricks be used locally to learn it or is it cloud only

I'm tired of telling clients or referrals I don't know databricks but it seems like the only option is to have a big AWS account and then use databricks on that data. Can I download it locally for training, upskilling with python or is it only for cl...

  • 91965 Views
  • 6 replies
  • 13 kudos
Latest Reply
Anonymous
Not applicable
  • 13 kudos

Thanks for linking directly to the docker image @Hubert Dudek​ ! And thanks for the info @Prabakar Ammeappin​ and @Amit Nainawati​ @Andrew Schell​ Let us know if you have more questions! If not, choose a best answer in this thread and let us know how...

  • 13 kudos
5 More Replies
MadelynM
by Databricks Employee
  • 7310 Views
  • 2 replies
  • 3 kudos

How do I move existing workflows and jobs running on an all-purpose cluster to a shared jobs cluster?

A Databricks cluster is a set of computation resources that performs the heavy lifting of all of the data workloads you run in Databricks. Databricks provides a number of options when you create and configure clusters to help you get the best perform...

Left navigation bar selecting Data Science & Engineering Left nav Workflows selected Screen Shot 2022-07-05 at 10.24.37 AM Screen Shot 2022-07-05 at 10.24.46 AM
  • 7310 Views
  • 2 replies
  • 3 kudos
Latest Reply
Anonymous
Not applicable
  • 3 kudos

@Doug Harrigan​ Thanks for your question! @Prabakar Ammeappin​ linked above to our Docs page that mentions a bit more about the recent (April) version update/change: "This release fixes an issue that removed the Swap cluster button from the Databrick...

  • 3 kudos
1 More Replies
NicolasEscobar
by New Contributor II
  • 11398 Views
  • 7 replies
  • 5 kudos

Resolved! Job fails after runtime upgrade

I have a job running with no issues in Databricks runtime 7.3 LTS. When I upgraded to 8.3 it fails with error An exception was thrown from a UDF: 'pyspark.serializers.SerializationError'... SparkContext should only be created and accessed on the driv...

  • 11398 Views
  • 7 replies
  • 5 kudos
Latest Reply
User16873042682
New Contributor II
  • 5 kudos

Adding to @Sean Owen​  comments, The only reason this is working is that the optimizer is evaluating this locally rather than creating a context on executors and evaluating it.

  • 5 kudos
6 More Replies
Raymond_Garcia
by Contributor II
  • 8436 Views
  • 1 replies
  • 1 kudos

Resolved! query array in SQL

Hello I have a databricks question I was not able to answer myselfI have this queryselect count(*) from tablewhere object[0].value is not null and object[0].value.value1 = "s"and created_year = 2022 and created_month = 7 and created_day = 4you can se...

  • 8436 Views
  • 1 replies
  • 1 kudos
Latest Reply
Raymond_Garcia
Contributor II
  • 1 kudos

SELECT count(*)FROM ( SELECT explode(mmycolumn) FROM table WHERE created_year = 2022 and created_month = 7 and created_day = 5)WHERE col.field is not null and col.field.field! = "signal"

  • 1 kudos
Vaibhav_552636
by New Contributor II
  • 2258 Views
  • 0 replies
  • 2 kudos

Delta Table Merge Operation logs Output is not correct number of updated records?

Hi all,I am performing merge operation on my delta table in spark. I have existing delta table , it already has some records. Now I created another dataframe of csv file, and added one new record and updated one records in that. Please check below sn...

initial delta table updated_source_tables for merge merger statment
  • 2258 Views
  • 0 replies
  • 2 kudos
LearningDatabri
by Contributor II
  • 9090 Views
  • 7 replies
  • 2 kudos

Resolved! Unable to read file from S3

I tried to read a file from S3, but facing the below error:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 53.0 failed 4 times, most recent failure: Lost task 0.3 in stage 53.0 (TID 82, xx.xx.xx.xx, executor 0): com...

  • 9090 Views
  • 7 replies
  • 2 kudos
Latest Reply
Sivaprasad1
Valued Contributor II
  • 2 kudos

Which DBR version are you using? Could you please test it with a different DBR version probably DBR 9.x?

  • 2 kudos
6 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 3431 Views
  • 4 replies
  • 6 kudos

Resolved! why this not able to go through?

https://textdoc.co/index.php/UFEQdwxWn60LtOVfError:https://textdoc.co/index.php/3JisnHKGkvLIaAOF

  • 3431 Views
  • 4 replies
  • 6 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 6 kudos

It would be best if you used Databricks ML runtime (in cluster settings), not the standard one.

  • 6 kudos
3 More Replies
THIAM_HUATTAN
by Valued Contributor
  • 2486 Views
  • 2 replies
  • 0 kudos

Resolved! Save data from Spark DataFrames to TFRecords

https://docs.microsoft.com/en-us/azure/databricks/_static/notebooks/deep-learning/tfrecords-save-load.htmlI could not run the Cell # 2java.lang.ClassNotFoundException: --------------------------------------------------------------------------- Py4JJ...

  • 2486 Views
  • 2 replies
  • 0 kudos
Latest Reply
jose_gonzalez
Databricks Employee
  • 0 kudos

Hi @THIAM HUAT TAN​,Which DBR version are you using? are you using the ML runtime?

  • 0 kudos
1 More Replies
User16826994223
by Honored Contributor III
  • 2891 Views
  • 1 replies
  • 1 kudos

Unity Catalog will allow you to bring your own HMS

Anyone know more about how the Unity Catalog will allow you to bring your own HMS (eg Glue)?Will this be treated as a separate 'catalog', which you can access but you can't use the other features of Unity Catalog on eg ABAC etcAny reading on this top...

  • 2891 Views
  • 1 replies
  • 1 kudos
Latest Reply
zpappa
Databricks Employee
  • 1 kudos

@Kunal Gaurav​ yes, it is treated as a synthetic catalog. You can query it by using the convention "hive_metastore" as the catalog name. i.e. SELECT * FROM hive_metastore.schema_name.table_nameThis will work for internal HMS, external HMS and Glue.Yo...

  • 1 kudos
PP1
by New Contributor II
  • 3559 Views
  • 2 replies
  • 2 kudos
  • 3559 Views
  • 2 replies
  • 2 kudos
Latest Reply
zpappa
Databricks Employee
  • 2 kudos

@Prashanth P​ We offer a fully featured REST API with Unity Catalog that provides the ability to CRUD objects such as catalogs/schemas/tables/acls/lineage etc.Companies like Colliba/Alation/MS Purview etc use these in middleware integrations to integ...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels