cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Phani1
by Valued Contributor II
  • 3725 Views
  • 2 replies
  • 0 kudos

SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): or No more address space to create NIC within injected virtual network

Currently we are using an all-purpose compute cluster. When we tried to allocate the scheduled jobs to job cluster, we are blocked at the following error:SUBNET_EXHAUSTED_FAILURE(CLOUD_FAILURE): azure_error_code:SubnetIsFull,azure_error_message:No mo...

  • 3725 Views
  • 2 replies
  • 0 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 0 kudos

Answering your questions - yes, your vnet/subnet is out of non-occupied IPs and this can be fixed by allocating more IPs to your network address space.Each cluster requires it's own IP, so if there are none available, it simply cannot start.

  • 0 kudos
1 More Replies
lewit
by New Contributor II
  • 2197 Views
  • 2 replies
  • 1 kudos

Is it possible to create a feature store training set directly from a feature store table?

Rather than joining features from different tables, I just wanted to use a single feature store table and select some of its features, but still log the model in the feature store. The problem I am facing is that I do not know how to create the train...

  • 2197 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, Could you please refer https://docs.databricks.com/machine-learning/feature-store/train-models-with-feature-store.html#create-a-trainingset-using-the-same-feature-multiple-times and let us know if this helps.

  • 1 kudos
1 More Replies
gpzz
by New Contributor II
  • 2227 Views
  • 2 replies
  • 1 kudos

MEMORY_ONLY not working

val doubledAmount = premiumCustomers.map(x=>(x._1, x._2*2)).persist(StorageLevel.MEMORY_ONLY) error: not found: value StorageLevel

  • 2227 Views
  • 2 replies
  • 1 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 1 kudos

Hi @Gaurav Poojary​ ,Can you please try the below as displayed in the image it is working for me without any issues.Happy Learning!!

  • 1 kudos
1 More Replies
bozhu
by Contributor
  • 2268 Views
  • 3 replies
  • 3 kudos

Set taskValues in DLT workbooks

Is "setting taskValues in DLT workbooks" supported?I tried setting a task value in a DLT workbook, but it does not seem supported, so downstream workbooks within the same workflows job cannot consume this task value.

  • 2268 Views
  • 3 replies
  • 3 kudos
Latest Reply
Lê_Ngọc_Lợi
New Contributor III
  • 3 kudos

I have the same issue, I also want to know databricks support taskValue between taskJob and DLT or not?

  • 3 kudos
2 More Replies
Vik1
by New Contributor II
  • 10587 Views
  • 3 replies
  • 5 kudos

Some very simple functions in Pandas on Spark are very slow

I have a pandas on spark dataframe with 8 million rows and 20 columns. It took 3.48 minutes to run df.shape and it takes. It also takes a long time to run df.head took 4.55 minutes . By contrast df.var1.value_counts().reset_index() took only 0.18 sec...

  • 10587 Views
  • 3 replies
  • 5 kudos
Latest Reply
PeterDowdy
New Contributor II
  • 5 kudos

The reason why this is slow is because pandas needs an index column to perform `shape` or `head`. If you don't provide one, pyspark pandas enumerates the entire dataframe to create a default one. For example, given columns A, B, and C in dataframe `d...

  • 5 kudos
2 More Replies
sunil_smile
by Contributor
  • 5917 Views
  • 2 replies
  • 1 kudos

Vnet peering settings is not enable in Azure databricks premium , even though its deployed inside my VNET?

Hi All,Vnet peering settings is not enabled in Azure databricks , even though its deployed inside my VNET?Here i not mentioned my vnet and subnet details , but filled this and created databricks (without private endpoint - allow public access)virtual...

image image image
  • 5917 Views
  • 2 replies
  • 1 kudos
Latest Reply
Debayan
Databricks Employee
  • 1 kudos

Hi, VNET peering is not supported or possible on VNET-injected workspaces. Please refer: https://learn.microsoft.com/en-us/azure/databricks/administration-guide/cloud-configurations/azure/vnet-peering#requirements

  • 1 kudos
1 More Replies
patdev
by New Contributor III
  • 2623 Views
  • 2 replies
  • 2 kudos

load new data in delta table

Hello all,I want to know how to update new data in delta table from new csv file.here is the code that i have used to create delta table from csv file and loaded data. but i have go new updated file and trying to load new data but not able to any gui...

  • 2623 Views
  • 2 replies
  • 2 kudos
Latest Reply
patdev
New Contributor III
  • 2 kudos

Thank you, i tried that and it ended in error, the table created with delta are from csv which must have converted to parquet file and all the columns are varchar or string. so not if i want to entered new file it ends in incmopatibility error for da...

  • 2 kudos
1 More Replies
sunil_smile
by Contributor
  • 15192 Views
  • 8 replies
  • 10 kudos

Resolved! How i can add ADLS Gen2 - OAuth 2.0 as Cluster scope for my High concurrency Shared Cluster (without unity catalog)?

Hi All,Kindly help me , how i can add the ADLS gen2 OAuth 2.0 authentication to my high concurrency shared cluster. I want to scope this authentication to entire cluster not for particular notebook.Currently i have added them as spark configuration o...

image.png image
  • 15192 Views
  • 8 replies
  • 10 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 10 kudos

error is because of missing default settings (create new cluster and do not remove them),the warning is because secrets should be put in secret scope, and then you should reference secrets in settings

  • 10 kudos
7 More Replies
JesseS
by New Contributor II
  • 7429 Views
  • 2 replies
  • 1 kudos

Resolved! How to extract source data from on-premise databases into a data lake and load with AutoLoader?

Here is the situation I am working with. I am trying to extract source data using Databricks JDBC connector using SQL Server databases as my data source. I want to write those into a directory in my data lake as JSON files, then have AutoLoader ing...

  • 7429 Views
  • 2 replies
  • 1 kudos
Latest Reply
Aashita
Databricks Employee
  • 1 kudos

To add to @werners point, I would use ADF to load SQL server data into ADLS Gen 2 as json. Then Load these Raw Json files from your ADLS base location into a Delta table using Autoloader.Delta Live Tables can be used in this scenario.You can also reg...

  • 1 kudos
1 More Replies
databicky
by Contributor II
  • 1193 Views
  • 1 replies
  • 0 kudos

Resolved! How to create border for sme specific cells?

i tried some code to create border for excel sheet, for particular cell iam able to write but while i am trying with some set of cells means it is showing error.​

  • 1193 Views
  • 1 replies
  • 0 kudos
Latest Reply
Chaitanya_Raju
Honored Contributor
  • 0 kudos

Hi @Mohammed sadamusean​ ,Can you please try similar to below code using loops, I have implemented a similar use case that might be useful, please let me know if you need further top = Side(border_style = 'thin',color = '00000000') bottom = Side(bor...

  • 0 kudos
sreedata
by New Contributor III
  • 4957 Views
  • 4 replies
  • 7 kudos

Resolved! Getting status of "If Condition" Activity into a variable

"If Condition" has lot of activities that can succeeded or fail. If any activity fails then whole "If Condition" fails. I have to get the status of the "If Condition" activity (pass or fail) so that i can use it for processing in the next notebook t...

  • 4957 Views
  • 4 replies
  • 7 kudos
Latest Reply
UmaMahesh1
Honored Contributor III
  • 7 kudos

In your ADF Pipeline activity, set two different pipeline activities from the If condition activity based on success or failure (the green and red arrows). Then inside each pipeline activity, you can add a set variable, get variable and your adb note...

  • 7 kudos
3 More Replies
Chanu
by New Contributor II
  • 2357 Views
  • 2 replies
  • 2 kudos

Databricks JAR task type functionality

Hi, I would like to understand Databricks JAR based workflow tasks. Can I interpret JAR based runs to be something like a spark-submit on a cluster? In the logs, I was expecting to see the spark-submit --class com.xyz --num-executors 4 etc., And, the...

  • 2357 Views
  • 2 replies
  • 2 kudos
Latest Reply
Chanu
New Contributor II
  • 2 kudos

Hi, I did try using the Workflows>Jobs>CreateTask>JarTaskType>UploadedMyJAR and Class and created JobCluster and tested this task. This JAR reads some tables as input, does some transformations and output as writing some other tables. I would like t...

  • 2 kudos
1 More Replies
pasiasty2077
by New Contributor
  • 8161 Views
  • 1 replies
  • 1 kudos

Partition filter is skipped when table is used in where condition, why?

Hi,maybe someone can help me i do want to run very narrow query SELECT * FROM my_table WHERE snapshot_date IN('2023-01-06', '2023-01-07')   -- part of the physical plan: -- Location: PreparedDeltaFileIndex [dbfs:/...] -- PartitionFilters: [cast(snaps...

  • 8161 Views
  • 1 replies
  • 1 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 1 kudos

No hints on partition pruning afaik.The reason the partitions were not pruned is because the second query generates a completely different plan.To be able to filter the partitions, a join first has to happen. And in this case it means the table has...

  • 1 kudos
sudhanshu1
by New Contributor III
  • 3796 Views
  • 4 replies
  • 2 kudos

Resolved! DLT workflow failing to read files from AWS S3

Hi All, I am trying to read streams directly from AWS S3. I set the instance profile , but when i run the workflow it fails with below error"No AWS Credentials provided by TemporaryAWSCredentialsProvider : shaded.databricks.org.apache.hadoop.fs.s3a.C...

  • 3796 Views
  • 4 replies
  • 2 kudos
Latest Reply
Vivian_Wilfred
Databricks Employee
  • 2 kudos

Hi @SUDHANSHU RAJ​ is UC enabled on this workspace? What is the access mode set on the cluster? Is this coming from the metastore or directly when you read from S3? Is the S3 cross-account?

  • 2 kudos
3 More Replies
alxsbn
by Contributor
  • 2980 Views
  • 2 replies
  • 2 kudos

Resolved! Autloader on CSV file didn't infer well cell with JSON data

Hello ! I playing with autoloader schema inference on a big S3 repo with +300 tables and large CSV files. I'm looking at autoloader with great attention, as it can be a great time saver on our ingestion process (data comes from a transactional DB gen...

  • 2980 Views
  • 2 replies
  • 2 kudos
Latest Reply
daniel_sahal
Esteemed Contributor
  • 2 kudos

PySpark by default is using \ as an escape character. You can change it to "Doc: https://docs.databricks.com/ingestion/auto-loader/options.html#csv-options

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels